Unsupervised topic adaptation for morph-based speech recognition

نویسندگان

  • André Mansikkaniemi
  • Mikko Kurimo
چکیده

Topic adaptation in automatic speech recognition (ASR) refers to the adaptation of language model and vocabulary for improved recognition of in-domain speech data. In this work we implement unsupervised topic adaptation for morph-based ASR, to improve recognition of foreign entity names. Based on first-pass ASR hypothesis similar texts are selected from a collection of articles, which are used to adapt the background language model. Latent semantic indexing is used to index the adaptation corpus and ASR output. We evaluate three different types of index terms and their usefulness in unsupervised LM adaptation: statistical morphs, words, and a combination of morphs and words. Furthermore, we implement vocabulary adaptation alongside unsupervised LM adaptation. Foreign word candidates are selected from the in-domain texts, based on how likely they are topic-related foreign entity names. Adapted pronunciation rules are generated for the selected foreign words. Morpheme adaptation is also performed by restoring over-segmented foreign words back into their base forms, to ensure more reliable pronunciation modeling.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Vocabulary Adaptation for Morph-based Language Models

Modeling of foreign entity names is an important unsolved problem in morpheme-based modeling that is common in morphologically rich languages. In this paper we present an unsupervised vocabulary adaptation method for morph-based speech recognition. Foreign word candidates are detected automatically from in-domain text through the use of letter n-gram perplexity. Over-segmented foreign entity na...

متن کامل

Unsupervised Language Model Adaptation for Lecture Speech Recognition

This paper addresses speaker adaptation of language model in large vocabulary spontaneous speech recognition. In spontaneous speech, the expression and pronunciation of words vary a lot depending on the speaker and topic. Therefore, we present unsupervised methods of language model adaptation to a specific speaker by (1) making direct use of the initial recognition result for generating an enha...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Unsupervised language model adaptation based on topic and role information in multiparty meetings

We continue our previous work on the modeling of topic and role information from multiparty meetings using a hierarchical Dirichlet process (HDP), in the context of language model adaptation. In this paper we focus on three problems: 1) an empirical analysis of the HDP as a nonparametric topic model; 2) the mismatch problem of vocabularies of the baseline n-gram model and the HDP; and 3) an aut...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013